Welcome to the Week 4 assignment! In this lab assignment, you will learn about Neural Style Transfer, an algorithm created by Gatys et al. (2015).
Upon completion of this assignment, you will be able to:
Most of the algorithms you've studied optimize a cost function to get a set of parameter values. With Neural Style Transfer, you'll get to optimize a cost function to get pixel values. Exciting!
Before submitting your assignment to the AutoGrader, please make sure you are not doing the following:
print statement(s) in the assignment.If you do any of the following, you will get something like, Grader Error: Grader feedback not found (or similarly unexpected) error upon submitting your assignment. Before asking for help/debugging the errors in your assignment, check for these first. If this is the case, and you don't remember the changes you have made, you can get a fresh copy of the assignment by following these instructions.
Run the following code cell to import the necessary packages and dependencies you will need to perform Neural Style Transfer.
### v1.2
import os
import sys
import scipy.io
import scipy.misc
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from PIL import Image
import numpy as np
import tensorflow as tf
import pprint
from public_tests import *
%matplotlib inline
Neural Style Transfer (NST) is one of the most fun and interesting optimization techniques in deep learning. It merges two images, namely: a "content" image (C) and a "style" image (S), to create a "generated" image (G). The generated image G combines the "content" of the image C with the "style" of image S.
In this assignment, you are going to combine the Louvre museum in Paris (content image C) with the impressionist style of Claude Monet (style image S) to generate the following image:

Let's get started!
Neural Style Transfer (NST) uses a previously trained convolutional network, and builds on top of that. The idea of using a network trained on a different task and applying it to a new task is called transfer learning.
You will be using the eponymously named VGG network from the original NST paper published by the Visual Geometry Group at University of Oxford in 2014. Specifically, you'll use VGG-19, a 19-layer version of the VGG network. This model has already been trained on the very large ImageNet database, and has learned to recognize a variety of low level features (at the shallower layers) and high level features (at the deeper layers).
Run the following code to load parameters from the VGG model. This may take a few seconds.
tf.random.set_seed(272) # DO NOT CHANGE THIS VALUE
pp = pprint.PrettyPrinter(indent=4)
img_size = 400
vgg = tf.keras.applications.VGG19(include_top=False,
input_shape=(img_size, img_size, 3),
weights='pretrained-model/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5')
vgg.trainable = False
pp.pprint(vgg)
<keras.engine.functional.Functional object at 0x7fb7bc6d7f70>
Next, you will be building the Neural Style Transfer (NST) algorithm in three steps:
One goal you should aim for when performing NST is for the content in generated image G to match the content of image C. To do so, you'll need an understanding of shallow versus deep layers :
You need the "generated" image G to have similar content as the input image C. Suppose you have chosen some layer's activations to represent the content of an image.
In this running example, the content image C will be the picture of the Louvre Museum in Paris. Run the code below to see a picture of the Louvre.
content_image = Image.open("images/louvre.jpg")
print("The content image (C) shows the Louvre museum's pyramid surrounded by old Paris buildings, against a sunny sky with a few clouds.")
content_image
The content image (C) shows the Louvre museum's pyramid surrounded by old Paris buildings, against a sunny sky with a few clouds.
One goal you should aim for when performing NST is for the content in generated image G to match the content of image C. A method to achieve this is to calculate the content cost function, which will be defined as:
$$J_{content}(C,G) = \frac{1}{4 \times n_H \times n_W \times n_C}\sum _{ \text{all entries}} (a^{(C)} - a^{(G)})^2\tag{1} $$
Compute the "content cost" using TensorFlow.
Instructions:
a_G: hidden layer activations representing content of the image G
a_C: hidden layer activations representing content of the image C
The 3 steps to implement this function are:
a_G: X, use: X.get_shape().as_list()a_C and a_G as explained in the picture abovetf.reshape(tensor, shape) takes a list of integers that represent the desired output shape.shape parameter, a -1 tells the function to choose the correct dimension size so that the output tensor still contains all the values of the original tensor.tf.reshape(a_C, shape=[m, n_H * n_W, n_C]) gives the same result as tf.reshape(a_C, shape=[m, -1, n_C]).tf.transpose(tensor, perm), where perm is a list of integers containing the original index of the dimensions. tf.transpose(a_C, perm=[0,3,1,2]) changes the dimensions from $(m, n_H, n_W, n_C)$ to $(m, n_C, n_H, n_W)$.tf.transpose to 'unroll' the tensors in this case but this is a useful function to practice and understand for other situations that you'll encounter.# UNQ_C1
# GRADED FUNCTION: compute_content_cost
def compute_content_cost(content_output, generated_output):
"""
Computes the content cost
Arguments:
a_C -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image C
a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image G
Returns:
J_content -- scalar that you compute using equation 1 above.
"""
a_C = content_output[-1]
a_G = generated_output[-1]
### START CODE HERE
# Retrieve dimensions from a_G (≈1 line)
m, n_H, n_W, n_C = a_G.get_shape().as_list()
# Reshape a_C and a_G (≈2 lines)
a_C_unrolled = tf.reshape(a_C, shape=[m, n_H * n_W, n_C]) # Or tf.reshape(a_C, shape=[m, -1 , n_C])
a_G_unrolled = tf.reshape(a_G, shape=[m, n_H * n_W, n_C]) # Or tf.reshape(a_G, shape=[m, -1 , n_C])
# compute the cost with tensorflow (≈1 line)
J_content = tf.reduce_sum(tf.square(a_C_unrolled - a_G_unrolled))/(4.0 * n_H * n_W * n_C)
### END CODE HERE
return J_content
### you cannot edit this cell
compute_content_cost_test(compute_content_cost)
J_content = tf.Tensor(7.056877, shape=(), dtype=float32)
All tests passed
Expected Output:
| J_content | 7.0568767 |
Congrats! You've now successfully calculated the content cost function!
What you should remember:
example = Image.open("images/monet_800600.jpg")
example
This was painted in the style of [impressionism](https://en.wikipedia.org/wiki/Impressionism).
Now let's see how you can now define a "style" cost function $J_{style}(S,G)$!
You will compute the Style matrix by multiplying the "unrolled" filter matrix with its transpose:

The result is a matrix of dimension $(n_C,n_C)$ where $n_C$ is the number of filters (channels). The value $G_{(gram)i,j}$ measures how similar the activations of filter $i$ are to the activations of filter $j$.
By capturing the prevalence of different types of features ($G_{(gram)ii}$), as well as how much different features occur together ($G_{(gram)ij}$), the Style matrix $G_{gram}$ measures the style of an image.
# UNQ_C2
# GRADED FUNCTION: gram_matrix
def gram_matrix(A):
"""
Argument:
A -- matrix of shape (n_C, n_H*n_W)
Returns:
GA -- Gram matrix of A, of shape (n_C, n_C)
"""
### START CODE HERE
#(≈1 line)
GA = tf.matmul(A, tf.transpose(A))
### END CODE HERE
return GA
### you cannot edit this cell
gram_matrix_test(gram_matrix)
GA =
tf.Tensor(
[[ 63.193256 -26.729713 -7.732155 ]
[-26.729713 12.775055 -2.5164719]
[ -7.732155 -2.5164719 23.746586 ]], shape=(3, 3), dtype=float32)
All tests passed
Expected Output:
| GA |
[[ 63.193256 -26.729713 -7.732155 ] [-26.729713 12.775055 -2.5164719] [ -7.732155 -2.5164719 23.746586 ]] |
You now know how to calculate the Gram matrix. Congrats! Your next goal will be to minimize the distance between the Gram matrix of the "style" image S and the Gram matrix of the "generated" image G.
Compute the style cost for a single layer.
Instructions: The 3 steps to implement this function are:
X.get_shape().as_list()tf.transpose can be used to change the order of the filter dimension.# UNQ_C3
# GRADED FUNCTION: compute_layer_style_cost
def compute_layer_style_cost(a_S, a_G):
"""
Arguments:
a_S -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image S
a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image G
Returns:
J_style_layer -- tensor representing a scalar value, style cost defined above by equation (2)
"""
### START CODE HERE
# Retrieve dimensions from a_G (≈1 line)
m, n_H, n_W, n_C = a_G.get_shape().as_list()
# Reshape the images to have them of shape (n_C, n_H*n_W) (≈2 lines)
a_S = tf.transpose(tf.reshape(a_S, shape=[-1, n_C]))
# OR a_S = tf.transpose(tf.reshape(a_S, shape=[ n_H * n_W, n_C]))
a_G = tf.transpose(tf.reshape(a_G, shape=[-1, n_C]))
# Computing gram_matrices for both images S and G (≈2 lines)
GS = gram_matrix(a_S)
GG = gram_matrix(a_G)
# Computing the loss (≈1 line)
J_style_layer = tf.reduce_sum(tf.square(GS - GG))/(4.0 *(( n_H * n_W * n_C)**2))
### END CODE HERE
return J_style_layer
### you cannot edit this cell
compute_layer_style_cost_test(compute_layer_style_cost)
J_style_layer = tf.Tensor(14.01649, shape=(), dtype=float32)
All tests passed
Expected Output:
| J_style_layer | 14.01649 |
Start by listing the layer names:
for layer in vgg.layers:
print(layer.name)
input_1 block1_conv1 block1_conv2 block1_pool block2_conv1 block2_conv2 block2_pool block3_conv1 block3_conv2 block3_conv3 block3_conv4 block3_pool block4_conv1 block4_conv2 block4_conv3 block4_conv4 block4_pool block5_conv1 block5_conv2 block5_conv3 block5_conv4 block5_pool
Get a look at the output of a layer block5_conv4. You will later define this as the content layer, which will represent the image.
vgg.get_layer('block5_conv4').output
<KerasTensor: shape=(None, 25, 25, 512) dtype=float32 (created by layer 'block5_conv4')>
Now choose layers to represent the style of the image and assign style costs:
STYLE_LAYERS = [
('block1_conv1', 0.2),
('block2_conv1', 0.2),
('block3_conv1', 0.2),
('block4_conv1', 0.2),
('block5_conv1', 0.2)]
You can combine the style costs for different layers as follows:
$$J_{style}(S,G) = \sum_{l} \lambda^{[l]} J^{[l]}_{style}(S,G)$$where the values for $\lambda^{[l]}$ are given in STYLE_LAYERS.
Compute style cost
Instructions:
compute_layer_style_cost(...) several times, and weights their results using the values in STYLE_LAYERS. compute_style_cost¶For each layer:
Once you're done with the loop:
### you cannot edit this cell
def compute_style_cost(style_image_output, generated_image_output, STYLE_LAYERS=STYLE_LAYERS):
"""
Computes the overall style cost from several chosen layers
Arguments:
style_image_output -- our tensorflow model
generated_image_output --
STYLE_LAYERS -- A python list containing:
- the names of the layers we would like to extract style from
- a coefficient for each of them
Returns:
J_style -- tensor representing a scalar value, style cost defined above by equation (2)
"""
# initialize the overall style cost
J_style = 0
# Set a_S to be the hidden layer activation from the layer we have selected.
# The last element of the array contains the content layer image, which must not be used.
a_S = style_image_output[:-1]
# Set a_G to be the output of the choosen hidden layers.
# The last element of the list contains the content layer image which must not be used.
a_G = generated_image_output[:-1]
for i, weight in zip(range(len(a_S)), STYLE_LAYERS):
# Compute style_cost for the current layer
J_style_layer = compute_layer_style_cost(a_S[i], a_G[i])
# Add weight * J_style_layer of this layer to overall style cost
J_style += weight[1] * J_style_layer
return J_style
How do you choose the coefficients for each layer? The deeper layers capture higher-level concepts, and the features in the deeper layers are less localized in the image relative to each other. So if you want the generated image to softly follow the style image, try choosing larger weights for deeper layers and smaller weights for the first layers. In contrast, if you want the generated image to strongly follow the style image, try choosing smaller weights for deeper layers and larger weights for the first layers.
What you should remember:
Finally, you will create a cost function that minimizes both the style and the content cost. The formula is:
$$J(G) = \alpha J_{content}(C,G) + \beta J_{style}(S,G)$$Implement the total cost function which includes both the content cost and the style cost.
# UNQ_C4
# GRADED FUNCTION: total_cost
@tf.function()
def total_cost(J_content, J_style, alpha = 10, beta = 40):
"""
Computes the total cost function
Arguments:
J_content -- content cost coded above
J_style -- style cost coded above
alpha -- hyperparameter weighting the importance of the content cost
beta -- hyperparameter weighting the importance of the style cost
Returns:
J -- total cost as defined by the formula above.
"""
### START CODE HERE
#(≈1 line)
J = alpha * J_content + beta * J_style
### START CODE HERE
return J
### you cannot edit this cell
total_cost_test(total_cost)
J = tf.Tensor(32.9832, shape=(), dtype=float32)
All tests passed
Expected Output:
| J | 32.9832 |
What you should remember:
Finally, you get to put everything together to implement Neural Style Transfer!
Here's what your program be able to do:
Here are the individual steps in detail.
Run the following code cell to load, reshape, and normalize your "content" image C (the Louvre museum picture):
content_image = np.array(Image.open("images/louvre_small.jpg").resize((img_size, img_size)))
content_image = tf.constant(np.reshape(content_image, ((1,) + content_image.shape)))
print(content_image.shape)
imshow(content_image[0])
plt.show()
(1, 400, 400, 3)
Now load, reshape and normalize your "style" image (Claude Monet's painting):
style_image = np.array(Image.open("images/monet.jpg").resize((img_size, img_size)))
style_image = tf.constant(np.reshape(style_image, ((1,) + style_image.shape)))
print(style_image.shape)
imshow(style_image[0])
plt.show()
(1, 400, 400, 3)
Now, you get to initialize the "generated" image as a noisy image created from the content_image.
generated_image = tf.Variable(tf.image.convert_image_dtype(content_image, tf.float32))
noise = tf.random.uniform(tf.shape(generated_image), -0.25, 0.25)
generated_image = tf.add(generated_image, noise)
generated_image = tf.clip_by_value(generated_image, clip_value_min=0.0, clip_value_max=1.0)
print(generated_image.shape)
imshow(generated_image.numpy()[0])
plt.show()
(1, 400, 400, 3)
def get_layer_outputs(vgg, layer_names):
""" Creates a vgg model that returns a list of intermediate output values."""
outputs = [vgg.get_layer(layer[0]).output for layer in layer_names]
model = tf.keras.Model([vgg.input], outputs)
return model
Now, define the content layer and build the model.
content_layer = [('block5_conv4', 1)]
vgg_model_outputs = get_layer_outputs(vgg, STYLE_LAYERS + content_layer)
Save the outputs for the content and style layers in separate variables.
content_target = vgg_model_outputs(content_image) # Content encoder
style_targets = vgg_model_outputs(style_image) # Style encoder
You've built the model, and now to compute the content cost, you will encode your content image using the appropriate hidden layer activations. Set this encoding to the variable a_C. Later in the assignment, you will need to do the same for the generated image, by setting the variable a_G to be the appropriate hidden layer activations. You will use layer block5_conv4 to compute the encoding. The code below does the following:
# Assign the content image to be the input of the VGG model.
# Set a_C to be the hidden layer activation from the layer we have selected
preprocessed_content = tf.Variable(tf.image.convert_image_dtype(content_image, tf.float32))
a_C = vgg_model_outputs(preprocessed_content)
The code below sets a_S to be the tensor giving the hidden layer activation for STYLE_LAYERS using our style image.
# Assign the input of the model to be the "style" image
preprocessed_style = tf.Variable(tf.image.convert_image_dtype(style_image, tf.float32))
a_S = vgg_model_outputs(preprocessed_style)
Below are the utils that you will need to display the images generated by the style transfer model.
def clip_0_1(image):
"""
Truncate all the pixels in the tensor to be between 0 and 1
Arguments:
image -- Tensor
J_style -- style cost coded above
Returns:
Tensor
"""
return tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)
def tensor_to_image(tensor):
"""
Converts the given tensor into a PIL image
Arguments:
tensor -- Tensor
Returns:
Image: A PIL image
"""
tensor = tensor * 255
tensor = np.array(tensor, dtype=np.uint8)
if np.ndim(tensor) > 3:
assert tensor.shape[0] == 1
tensor = tensor[0]
return Image.fromarray(tensor)
Implement the train_step() function for transfer learning
J.alpha = 10 and beta = 40.# UNQ_C5
# GRADED FUNCTION: train_step
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
@tf.function()
def train_step(generated_image):
with tf.GradientTape() as tape:
# In this function you must use the precomputed encoded images a_S and a_C
# Compute a_G as the vgg_model_outputs for the current generated image
### START CODE HERE
##(1 line)
a_G = vgg_model_outputs(generated_image)
# Compute the style cost
#(1 line)
J_style = compute_style_cost(a_S, a_G)
#(2 lines)
# Compute the content cost
J_content = compute_content_cost(a_C, a_G)
# Compute the total cost
J = total_cost(J_content, J_style)
### END CODE HERE
grad = tape.gradient(J, generated_image)
optimizer.apply_gradients([(grad, generated_image)])
generated_image.assign(clip_0_1(generated_image))
# For grading purposes
return J
### you cannot edit this cell
# You always must run the last cell before this one. You will get an error if not.
generated_image = tf.Variable(generated_image)
train_step_test(train_step, generated_image)
tf.Tensor(25700.346, shape=(), dtype=float32)
tf.Tensor(17778.377, shape=(), dtype=float32)
All tests passed
Expected output
tf.Tensor(25700.346, shape=(), dtype=float32)
tf.Tensor(17778.389, shape=(), dtype=float32)
Looks like it's working! Now you'll get to put it all together into one function to better see your results!
Run the following cell to generate an artistic image. It should take about 3min on a GPU for 2500 iterations. Neural Style Transfer is generally trained using GPUs.
If you increase the learning rate you can speed up the style transfer, but often at the cost of quality.
# Show the generated image at some epochs
# Uncomment to reset the style transfer process. You will need to compile the train_step function again
epochs = 2501
for i in range(epochs):
train_step(generated_image)
if i % 250 == 0:
print(f"Epoch {i} ")
if i % 250 == 0:
image = tensor_to_image(generated_image)
imshow(image)
image.save(f"output/image_{i}.jpg")
plt.show()
Epoch 0
Epoch 250
Epoch 500
Epoch 750
Epoch 1000
Epoch 1250
Epoch 1500
Epoch 1750
Epoch 2000
Epoch 2250
Epoch 2500
Now, run the following code cell to see the results!
# Show the 3 images in a row
fig = plt.figure(figsize=(16, 4))
ax = fig.add_subplot(1, 3, 1)
imshow(content_image[0])
ax.title.set_text('Content image')
ax = fig.add_subplot(1, 3, 2)
imshow(style_image[0])
ax.title.set_text('Style image')
ax = fig.add_subplot(1, 3, 3)
imshow(generated_image[0])
ax.title.set_text('Generated image')
plt.show()
Look at that! You did it! The Generated image on the right is the image you generated with the model.
Note 1: These are the results of you training the model with a learning_rate=0.01 (which was set in Ex 6) and ran it for epochs = 2501 (set in Section 5.6). If you want to look at the in between results after every 250 epochs, click on File --> Open.... The go to /output directory to see all of the saved images.
Note 2: The hyperparameters (learning_rate=0.01 and epochs = 2501) were set to these values so that you didn't have to wait too long to see an initial result. To get the best looking results, you may want to try running the optimization algorithm longer (and perhaps with a smaller learning rate). After completing, submitting and getting your desired grade for this assignment with these hyperparameters (learning_rate=0.01 and epochs = 2501), you can come back and play more with this notebook, and see if you can generate even better looking images. Running for around epochs = 20000 with a learning_rate=0.001, you should see something like the image presented below on the right. But first, give yourself a pat on the back for finishing this long assignment!

Here are few other examples:
The beautiful ruins of the ancient city of Persepolis (Iran) with the style of Van Gogh (The Starry Night)

The tomb of Cyrus the great in Pasargadae with the style of a Ceramic Kashi from Ispahan.

A scientific study of a turbulent fluid with the style of a abstract blue fluid painting.

If you don't plan on continuing to the next Optional section, help us to provide our learners a smooth learning experience, by freeing up the resources used by your assignment by running the cell below so that the other learners can take advantage of those resources just as much as you did. Thank you!
Note:
Ok.restart the kernel.%%javascript
IPython.notebook.save_checkpoint();
if (confirm("Clear memory?") == true)
{
IPython.notebook.kernel.restart();
}
Finally, you can also rerun the algorithm on your own images!
To do so, go back to Section 5 and change the content image and style image with your own pictures. In detail, here's what you should do:
File -> Open in the upper tab of the notebook/images and upload your images (images will scaled to 400x400, but you can change that parameter too in section 2), rename them my_content.png and my_style.png for example.content_image = np.array(Image.open("images/louvre_small.jpg").resize((img_size, img_size)))
style_image = np.array(Image.open("images/monet.jpg").resize((img_size, img_size)))
to:
content_image = np.array(Image.open("images/my_content.jpg").resize((img_size, img_size)))
style_image = np.array(Image.open("images/my_style.jpg").resize((img_size, img_size)))
You can share your generated images with us on social media with the hashtag #deeplearningAI or by tagging us directly!
Here are some ideas on how to tune your hyperparameters:
STYLE_LAYERSepochs given in Section 5.6.Happy coding!
In order to provide our learners a smooth learning experience, please free up the resources used by your assignment by running the cell below so that the other learners can take advantage of those resources just as much as you did. Thank you!
Note:
Ok.restart the kernel.%%javascript
IPython.notebook.save_checkpoint();
if (confirm("Clear memory?") == true)
{
IPython.notebook.kernel.restart();
}
Great job on completing this assignment! You are now able to use Neural Style Transfer to generate artistic images. This is also your first time building a model in which the optimization algorithm updates the pixel values rather than the neural network's parameters. Deep learning has many different types of models and this is only one of them!
This was the final programming exercise of this course. Congratulations - you've finished all the programming exercises of this course on Convolutional Networks! See you in Course 5, Sequence Models!
The Neural Style Transfer algorithm was due to Gatys et al. (2015). Harish Narayanan and Github user "log0" also have highly readable write-ups this lab was inspired by. The pre-trained network used in this implementation is a VGG network, which is due to Simonyan and Zisserman (2015). Pre-trained weights were from the work of the MathConvNet team.